Knapsack Based Optimal Policies for Budget-Limited Multi-Armed Bandits

نویسندگان

  • Long Tran-Thanh
  • Archie C. Chapman
  • Alex Rogers
  • Nicholas R. Jennings
چکیده

In budget–limited multi–armed bandit (MAB) problems, the learner’s actions are costly and constrained by a fixed budget. Consequently, an optimal exploitation policy may not be to pull the optimal arm repeatedly, as is the case in other variants of MAB, but rather to pull the sequence of different arms that maximises the agent’s total reward within the budget. This difference from existing MABs means that new approaches to maximising the total reward are required. Given this, we develop two pulling policies, namely: (i) KUBE; and (ii) fractional KUBE. Whereas the former provides better performance up to 40% in our experimental settings, the latter is computationally less expensive. We also prove logarithmic upper bounds for the regret of both policies, and show that these bounds are asymptotically optimal (i.e. they only differ from the best possible regret by a constant factor).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ǫ–First Policies for Budget–Limited Multi-Armed Bandits

We introduce the budget–limited multi–armed bandit (MAB), which captures situations where a learner’s actions are costly and constrained by a fixed budget that is incommensurable with the rewards earned from the bandit machine, and then describe a first algorithm for solving it. Since the learner has a budget, the problem’s duration is finite. Consequently an optimal exploitation policy is not ...

متن کامل

An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives

We consider a contextual version of multi-armed bandit problem with global knapsack constraints. In each round, the outcome of pulling an arm is a scalar reward and a resource consumption vector, both dependent on the context, and the global knapsack constraints require the total consumption for each resource to be below some pre-fixed budget. The learning agent competes with an arbitrary set o...

متن کامل

Multi - armed restless bandits , index policies , and dynamic priority allocation

This paper presents a brief introduction to the emerging research field of multi-armed restless bandits (MARBs), which substantially extend the modeling power of classic multi-armed bandits. MARBs are Markov decision process models for optimal dynamic priority allocation to a collection of stochastic binary-action (active/passive) projects evolving over time. Interest in MARBs has grown steadil...

متن کامل

Budgeted Bandit Problems with Continuous Random Costs

We study the budgeted bandit problem, where each arm is associated with both a reward and a cost. In a budgeted bandit problem, the objective is to design an arm pulling algorithm in order to maximize the total reward before the budget runs out. In this work, we study both multi-armed bandits and linear bandits, and focus on the setting with continuous random costs. We propose an upper confiden...

متن کامل

Almost Optimal Exploration in Multi-Armed Bandits

We study the problem of exploration in stochastic Multi-Armed Bandits. Even in the simplest setting of identifying the best arm, there remains a logarithmic multiplicative gap between the known lower and upper bounds for the number of arm pulls required for the task. This extra logarithmic factor is quite meaningful in nowadays large-scale applications. We present two novel, parameterfree algor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1204.1909  شماره 

صفحات  -

تاریخ انتشار 2012